17. Your First (Overfit) POI Identifier

Your First (Overfit) POI Identifier

Question:

You’ll start by building the simplest imaginable (unvalidated) POI identifier. The starter code ( validation/validate_poi.py ) for this lesson is pretty bare--all it does is read in the data, and format it into lists of labels and features. Create a decision tree classifier (just use the default parameters), train it on all the data (you will fix this in the next part!), and print out the accuracy. THIS IS AN OVERFIT TREE, DO NOT TRUST THIS NUMBER! Nonetheless, what’s the accuracy?

Start Quiz:

INSTRUCTOR NOTE:

From Python 3.3 forward, a change to the order in which dictionary keys are processed was made such that the orders are randomized each time the code is run. This will cause some compatibility problems with the graders and project code, which were run under Python 2.7. To correct for this, add the following argument to the featureFormat call on line 25 of validate_poi.py :

sort_keys = '../tools/python2_lesson13_keys.pkl'

This will open up a file in the tools folder with the Python 2 key order.

Note: If you are not getting the results expected by the grader, then you may want to check the file tools/feature_format.py . Due to changes in the final project, some file changes have affected the numbers output on this assignment as written. Check that you have the most recent version of the file from the repository, such that the featureFormat has a default parameter for sort_keys = False and that keys = dictionary.keys() results.